NoSQL Cheatsheet

NoSQL Concepts Overview

Concept	Description
NoSQL	Refers to non-relational databases that can handle large volumes of rapidly changing data and scale out horizontally. They often relax strict ACID properties for performance and flexibility.
CAP Theorem	States that a distributed system can only guarantee two of the following simultaneously: Consistency, Availability, and Partition Tolerance. Different NoSQL databases choose different trade-offs.
BASE	“Basically Available, Soft state, Eventually consistent.” Many NoSQL databases follow BASE principles instead of strict ACID transactions to optimize for performance and scalability.
Horizontal Scalability	NoSQL databases typically scale by adding more nodes (sharding/partitioning data), rather than vertically scaling a single server.
Schema Flexibility	Most NoSQL databases do not enforce a rigid schema. The data model can evolve more easily as requirements change.

Key-Value Databases

Concept	Description	Schema Example
Definition	Stores data in a simple key-value pair. Good for caching and real-time applications with simple lookups. Examples: Redis, Memcached.	Key: "user:1001" Value: "John Doe"
Typical Use Cases	Session management, caching, leaderboard counts, token storage, quick retrieval by key.	Example: Key: "session:abc123" Value: "{'user_id': 1001, 'expires': '2025-03-09T15:00:00'}"
Common Commands	Set, Get, Delete by key.	Redis Example: `SET user:1001 "John Doe"` `GET user:1001`

Document Databases

Concept	Description	Schema Example
Definition	Stores data as documents (usually JSON or BSON). Offers flexible schema and advanced querying. Examples: MongoDB, CouchDB, Firestore.	MongoDB Document Example: `{ "_id": 1001, "name": "John Doe", "email": "john@example.com", "orders": [ { "order_id": 500, "total": 89.99 } ] }`
Typical Use Cases	Content management systems, user profiles, event logging, any scenario requiring flexible data structures.	Collection: "users" Documents: represent individual user profiles, each can have different fields if needed.
Common Commands	Insert, Find, Update, Delete (typical CRUD operations). Also supports indexing for fields.	MongoDB Example: `db.users.insertOne({ _id: 1001, name: "John Doe" })` `db.users.find({ _id: 1001 })`
Query Flexibility	Can query nested fields, arrays, and perform aggregations. Supports advanced operators like `$in`, `$lt`, `$regex`, etc.	Aggregation Example: `db.orders.aggregate([ { $match: { status: "shipped" } }, { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } } ])`

Column-Family Databases

Concept	Description	Schema Example
Definition	Organize data into column families, which contain rows that can have varying columns. Optimized for reading/writing large volumes of data. Examples: Cassandra, HBase.	Cassandra Table Example: Keyspace: my_app Table: users Primary key: (user_id) `CREATE TABLE users ( user_id int, name text, email text, PRIMARY KEY (user_id) );`
Typical Use Cases	High write throughput, large-scale analytics, time-series data, event logging with predictable query patterns.	Example: For storing sensor readings, each row can have columns for each timestamp.
Common Commands	CQL (Cassandra Query Language) is similar to SQL. You define tables, insert/update data, use partition keys, clustering keys, etc.	Cassandra Example: `INSERT INTO users (user_id, name, email) VALUES (1001, 'John Doe', 'john@example.com');`
Partitioning	Data is distributed across the cluster using partition keys for horizontal scalability. Careful design of partition keys is crucial for performance.	Partition Key Example: `PRIMARY KEY ((user_id), some_other_key)`

Graph Databases

Concept	Description	Schema Example
Definition	Designed to store data in nodes and relationships (edges). Excellent for highly interconnected data. Examples: Neo4j, JanusGraph.	Neo4j Schema Example: Nodes: (Person { name: "John", age: 30 }) Relationship: (John)-[:KNOWS]->(Jane)
Typical Use Cases	Social networks, recommendation engines, fraud detection, network topologies, or anything requiring graph traversal.	Example: A "Friend of a Friend" search or shortest path between entities.
Common Commands / Query Language	Cypher (Neo4j), Gremlin (Apache TinkerPop). Queries use pattern matching on node labels and relationships.	Neo4j Cypher Query Example: `MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = "John" RETURN friend;`

General Best Practices

Practice	Description
Know Your Access Patterns	Design your schema (or data model) around how data is queried. This is critical in NoSQL to optimize performance.
Use Indexes Wisely	Indexes speed up reads but can slow writes and use additional memory/disk. Only index what you really need.
Partition / Shard Carefully	Even data distribution across clusters is important for performance. Avoid hotspots by choosing keys that won’t concentrate loads on a single node.
Monitor and Tune	Monitor query performance, resource usage, and replication lag. Tweak configurations (e.g., read/write consistency levels) for your workload.
Security and Backups	Enable authentication/authorization, encrypt data at rest and in transit, and have a reliable backup/recovery strategy.